76 research outputs found

    On the Distribution of Salient Objects in Web Images and its Influence on Salient Object Detection

    Get PDF
    It has become apparent that a Gaussian center bias can serve as an important prior for visual saliency detection, which has been demonstrated for predicting human eye fixations and salient object detection. Tseng et al. have shown that the photographer's tendency to place interesting objects in the center is a likely cause for the center bias of eye fixations. We investigate the influence of the photographer's center bias on salient object detection, extending our previous work. We show that the centroid locations of salient objects in photographs of Achanta and Liu's data set in fact correlate strongly with a Gaussian model. This is an important insight, because it provides an empirical motivation and justification for the integration of such a center bias in salient object detection algorithms and helps to understand why Gaussian models are so effective. To assess the influence of the center bias on salient object detection, we integrate an explicit Gaussian center bias model into two state-of-the-art salient object detection algorithms. This way, first, we quantify the influence of the Gaussian center bias on pixel- and segment-based salient object detection. Second, we improve the performance in terms of F1 score, Fb score, area under the recall-precision curve, area under the receiver operating characteristic curve, and hit-rate on the well-known data set by Achanta and Liu. Third, by debiasing Cheng et al.'s region contrast model, we exemplarily demonstrate that implicit center biases are partially responsible for the outstanding performance of state-of-the-art algorithms. Last but not least, as a result of debiasing Cheng et al.'s algorithm, we introduce a non-biased salient object detection method, which is of interest for applications in which the image data is not likely to have a photographer's center bias (e.g., image data of surveillance cameras or autonomous robots)

    Biased Competition in Visual Processing Hierarchies: A Learning Approach Using Multiple Cues

    Get PDF
    In this contribution, we present a large-scale hierarchical system for object detection fusing bottom-up (signal-driven) processing results with top-down (model or task-driven) attentional modulation. Specifically, we focus on the question of how the autonomous learning of invariant models can be embedded into a performing system and how such models can be used to define object-specific attentional modulation signals. Our system implements bi-directional data flow in a processing hierarchy. The bottom-up data flow proceeds from a preprocessing level to the hypothesis level where object hypotheses created by exhaustive object detection algorithms are represented in a roughly retinotopic way. A competitive selection mechanism is used to determine the most confident hypotheses, which are used on the system level to train multimodal models that link object identity to invariant hypothesis properties. The top-down data flow originates at the system level, where the trained multimodal models are used to obtain space- and feature-based attentional modulation signals, providing biases for the competitive selection process at the hypothesis level. This results in object-specific hypothesis facilitation/suppression in certain image regions which we show to be applicable to different object detection mechanisms. In order to demonstrate the benefits of this approach, we apply the system to the detection of cars in a variety of challenging traffic videos. Evaluating our approach on a publicly available dataset containing approximately 3,500 annotated video images from more than 1 h of driving, we can show strong increases in performance and generalization when compared to object detection in isolation. Furthermore, we compare our results to a late hypothesis rejection approach, showing that early coupling of top-down and bottom-up information is a favorable approach especially when processing resources are constrained

    Modelling Visual Search with the Selective Attention for Identification Model (VS-SAIM): A Novel Explanation for Visual Search Asymmetries

    Get PDF
    In earlier work, we developed the Selective Attention for Identification Model (SAIM [16]). SAIM models the human ability to perform translation-invariant object identification in multiple object scenes. SAIM suggests that central for this ability is an interaction between parallel competitive processes in a selection stage and a object identification stage. In this paper, we applied the model to visual search experiments involving simple lines and letters. We presented successful simulation results for asymmetric and symmetric searches and for the influence of background line orientations. Search asymmetry refers to changes in search performance when the roles of target item and non-target item (distractor) are swapped. In line with other models of visual search, the results suggest that a large part of the empirical evidence can be explained by competitive processes in the brain, which are modulated by the similarity between target and distractor. The simulations also suggest that another important factor is the feature properties of distractors. Finally, the simulations indicate that search asymmetries can be the outcome of interactions between top-down (knowledge about search items) and bottom-up (feature of search items) processing. This interaction in VS-SAIM is dominated by a novel mechanism, the knowledge-based on-centre-off-surround receptive field. This receptive field is reminiscent of the classical receptive fields but the exact shape is modulated by both, top-down and bottom-up processes. The paper discusses supporting evidence for the existence of this novel concept

    Dynamic, Task-Related and Demand-Driven Scene Representation

    Get PDF
    Humans selectively process and store details about the vicinity based on their knowledge about the scene, the world and their current task. In doing so, only those pieces of information are extracted from the visual scene that is required for solving a given task. In this paper, we present a flexible system architecture along with a control mechanism that allows for a task-dependent representation of a visual scene. Contrary to existing approaches, our system is able to acquire information selectively according to the demands of the given task and based on the system’s knowledge. The proposed control mechanism decides which properties need to be extracted and how the independent processing modules should be combined, based on the knowledge stored in the system’s long-term memory. Additionally, it ensures that algorithmic dependencies between processing modules are resolved automatically, utilizing procedural knowledge which is also stored in the long-term memory. By evaluating a proof-of-concept implementation on a real-world table scene, we show that, while solving the given task, the amount of data processed and stored by the system is considerably lower compared to processing regimes used in state-of-the-art systems. Furthermore, our system only acquires and stores the minimal set of information that is relevant for solving the given task

    The time course of exogenous and endogenous control of covert attention

    Get PDF
    Studies of eye-movements and manual response have established that rapid overt selection is largely exogenously driven toward salient stimuli, whereas slower selection is largely endogenously driven to relevant objects. We use the N2pc, an event-related potential index of covert attention, to demonstrate that this time course reflects an underlying pattern in the deployment of covert attention. We find that shifts of attention that occur soon after the onset of a visual search array are directed toward salient, task-irrelevant visual stimuli and are associated with slow responses to the target. In contrast, slower shifts are target-directed and are associated with fast responses. The time course of exogenous and endogenous control provides a framework in which some inconsistent results in the capture literature might be reconciled; capture may occur when attention is rapidly deployed

    Explaining efficient search for conjunctions of motion and form: Evidence from negative color effects

    Get PDF
    Dent, Humphreys, and Braithwaite (2011) showed substantial costs to search when a moving target shared its color with a group of ignored static distractors. The present study further explored the conditions under which such costs to performance occur. Experiment 1 tested whether the negative color-sharing effect was specific to cases in which search showed a highly serial pattern. The results showed that the negative color-sharing effect persisted in the case of a target defined as a conjunction of movement and form, even when search was highly efficient. In Experiment 2, the ease with which participants could find an odd-colored target amongst a moving group was examined. Participants searched for a moving target amongst moving and stationary distractors. In Experiment 2A, participants performed a highly serial search through a group of similarly shaped moving letters. Performance was much slower when the target shared its color with a set of ignored static distractors. The exact same displays were used in Experiment 2B; however, participants now responded "present" for targets that shared the color of the static distractors. The same targets that had previously been difficult to find were now found efficiently. The results are interpreted in a flexible framework for attentional control. Targets that are linked with irrelevant distractors by color tend to be ignored. However, this cost can be overridden by top-down control settings. © 2014 Psychonomic Society, Inc

    We get the algorithms of our ground truths: Designing referential databases in digital image processing.

    Get PDF
    This article documents the practical efforts of a group of scientists designing an image-processing algorithm for saliency detection. By following the actors of this computer science project, the article shows that the problems often considered to be the starting points of computational models are in fact provisional results of time-consuming, collective and highly material processes that engage habits, desires, skills and values. In the project being studied, problematization processes lead to the constitution of referential databases called 'ground truths' that enable both the effective shaping of algorithms and the evaluation of their performances. Working as important common touchstones for research communities in image processing, the ground truths are inherited from prior problematization processes and may be imparted to subsequent ones. The ethnographic results of this study suggest two complementary analytical perspectives on algorithms: (1) an 'axiomatic' perspective that understands algorithms as sets of instructions designed to solve given problems computationally in the best possible way, and (2) a 'problem-oriented' perspective that understands algorithms as sets of instructions designed to computationally retrieve outputs designed and designated during specific problematization processes. If the axiomatic perspective on algorithms puts the emphasis on the numerical transformations of inputs into outputs, the problem-oriented perspective puts the emphasis on the definition of both inputs and outputs

    Modelling Visual Neglect: Computational Insights into Conscious Perception

    Get PDF
    Background: Visual neglect is an attentional deficit typically resulting from parietal cortex lesion and sometimes frontal lesion. Patients fail to attend to objects and events in the visual hemifield contralateral to their lesion during visual search. Methodology/Principal Finding: The aim of this work was to examine the effects of parietal and frontal lesion in an existing computational model of visual attention and search and simulate visual search behaviour under lesion conditions. We find that unilateral parietal lesion in this model leads to symptoms of visual neglect in simulated search scan paths, including an inhibition of return (IOR) deficit, while frontal lesion leads to milder neglect and to more severe deficits in IOR and perseveration in the scan path. During simulations of search under unilateral parietal lesion, the model’s extrastriate ventral stream area exhibits lower activity for stimuli in the neglected hemifield compared to that for stimuli in the normally perceived hemifield. This could represent a computational correlate of differences observed in neuroimaging for unconscious versus conscious perception following parietal lesion. Conclusions/Significance: Our results lead to the prediction, supported by effective connectivity evidence, that connections between the dorsal and ventral visual streams may be an important factor in the explanation of perceptua
    corecore